Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same Exon duplication, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.
Exon shuffling follows certain splice frame rules. Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons (phase 0 introns), between the first and second nucleotide of a codon (phase 1 introns), or between the second and third nucleotide of a codon (phase 2 introns). Additionally exons can be classified into nine different groups based on the phase of the flanking introns (symmetrical: 0-0, 1-1, 2-2 and asymmetrical: 0–1, 0–2, 1–0, 1–2, etc.) Symmetric exons are the only ones that can be inserted into introns, undergo duplication, or be deleted without changing the reading frame.
In order for exon shuffling to start to play a major role in protein evolution the appearance of spliceosomal introns had to take place. This was due to the fact that the self-splicing introns of the RNA world were unsuitable for exon-shuffling by intronic recombination. These introns had an essential function and therefore could not be recombined. Additionally there is strong evidence that spliceosomal introns evolved fairly recently and are restricted in their evolutionary distribution. Therefore, exon shuffling became a major role in the construction of younger proteins.
Moreover, to define more precisely the time when exon shuffling became significant in eukaryotes, the evolutionary distribution of modular proteins that evolved through this mechanism were examined in different organisms such as Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana. These studies suggested that there was an inverse relationship between the genome compactness and the proportion of intronic and repetitive sequences, and that exon shuffling became significant after metazoan radiation.
There is a mechanism for the formation and shuffling of said domains, this is the modularization hypothesis. This mechanism is divided into three stages. The first stage is the insertion of introns at positions that correspond to the boundaries of a protein domain. The second stage is when the "protomodule" undergoes tandem duplications by recombination within the inserted introns. The third stage is when one or more protomodules are transferred to a different nonhomologous gene by intronic recombination. All states of modularization have been observed in different domains such as those of hemostatic proteins.
Upon transposition, L1 associates with 3' flanking DNA and carries the non-L1 sequence to a new genomic location. This new location does not have to be in a homologous sequence or in close proximity to the donor DNA sequence. The donor DNA sequence remains unchanged throughout this process because it functions in a copy-paste manner via RNA intermediates; however, only those regions located in the 3' region of the L1 have been proven to be targeted for duplication.
Nevertheless, there is reason to believe that this may not hold true every time as shown by the following example. The human ATM gene is responsible for the human autosomal-recessive disorder ataxia-telangiectasia and is located on chromosome 11. However, a partial ATM sequence is found in chromosome 7. Molecular features suggest that this duplication was mediated by L1 retrotransposition: the derived sequence was flanked by 15bp target side duplications (TSD), the sequence around the 5' end matched with the consensus sequence for L1 endonuclease cleavage site and a poly(A) tail preceded the 3' TSD. But since the L1 element was present in neither the retrotransposed segment nor the original sequence the mobilization of the segment cannot be explained by 3' transduction. Additional information has led to the belief that trans-mobilization of the DNA sequence is another mechanism of L1 to shuffle exons, but more research on the subject must be done.
Helitron encoded proteins are composed of a rolling-circle (RC) replication initiator (Rep) and a DNA helicase (Hel) domain. The Rep domain is involved in the catalytic reactions for endonucleolytic cleavage, DNA transfer and ligation. In addition this domain contains three motifs. The first motif is necessary for DNA binding. The second motif has two histidines and is involved in metal ion binding. Lastly the third motif has two tyrosines and catalyzes DNA cleavage and ligation.
There are three models of gene capture by helitrons: the 'read-through" model 1 (RTM1), the 'read-through" model 2 (RTM2) and a filler DNA model (FDNA). According to the RTM1 model an accidental "malfunction" of the replication terminator at the 3' end of the Helitron leads to transposition of genomic DNA. It is composed of the read-through Helitron element and its downstream genomic regions, flanked by a random DNA site, serving as a "de novo" RC terminator. According to the RTM2 model the 3' terminus of another Helitron serves as an RC terminator of transposition. This occurs after a malfunction of the RC terminator. Lastly in the FDNA model portions of genes or non-coding regions can accidentally serve as templates during repair of ds DNA breaks occurring in helitrons. Even though helitrons have been proven to be a very important evolutionary tool, the specific details for their mechanisms of transposition are yet to be defined.
An example of evolution by using helitrons is the diversity commonly found in maize. Helitrons in maize cause a constant change of genic and nongenic regions by using transposable elements, leading to diversity among different maize lines.
The LTR retrotransponsons require an RNA intermediate in their transposition cycle mechanism. Retrotransponsons synthesize a cDNA copy based on the RNA strand using a reverse transcriptase related to retroviral RT. The cDNA copy is then inserted into new genomic positions to form a retrogene. This mechanism has been proven to be important in gene evolution of rice and other grass species through exon shuffling.
There are two classes of IR: The first corresponds to errors of enzymes which cut and join DNA (i.e., DNases.) This process is initiated by a replication protein which helps generate a primer for DNA synthesis. While one DNA strand is being synthesized the other is being displaced. This process ends when the displaced strand is joined by its ends by the same replication protein. The second class of IR corresponds to the recombination of short homologous sequences which are not recognized by the previously mentioned enzymes. However, they can be recognized by non-specific enzymes which introduce cuts between the repeats. The ends are then removed by exonuclease to expose the repeats. Then the repeats anneal and the resulting molecule is repaired using polymerase and ligase.
Transposon mediated
Long interspersed element (LINE)-1
Helitron
Long-terminal repeat (LTR) retrotransposons
Transposons with Terminal inverted repeats (TIRs)
Illegitimate recombination
See also
|
|